Combined Multi-Channel NMF-Based Robust Beamforming for Noisy Speech Recognition
نویسندگان
چکیده
We propose a novel acoustic beamforming method using blind source separation (BSS) techniques based on non-negative matrix factorization (NMF). In conventional mask-based approaches, hard or soft masks are estimated and beamforming is performed using speech and noise spatial covariance matrices calculated from masked noisy observations, but the phase information of the target speech is not adequately preserved. In the proposed method, we perform complex-domain source separation based on multi-channel NMF with rank-1 spatial model (rank-1 MNMF) to obtain a speech spatial covariance matrix for estimating a steering vector for the target speech utilizing the separated speech observation in each time-frequency bin. This accurate steering vector estimation is effectively combined with our novel noise mask prediction method using multi-channel robust NMF (MRNMF) to construct a Maximum Likelihood (ML) beamformer that achieved a better speech recognition performance than a state-of-the-art DNN-based beamformer with no environment-specific training. Superiority of the phase preserving source separation to real-valued masks in beamforming is also confirmed through ASR experiments.
منابع مشابه
Noise Robust Speech Recognition Using Multi-Channel Based Channel Selection And ChannelWeighting
In this paper, we study several microphone channel selection and weighting methods for robust automatic speech recognition (ASR) in noisy conditions. For channel selection, we investigate two methods based on the maximum likelihood (ML) criterion and minimum autoencoder reconstruction criterion, respectively. For channel weighting, we produce enhanced log Mel filterbank coefficients as a weight...
متن کاملFeature mapping using far-field microphones for distant speech recognition
Acoustic modeling based on deep architectures has recently gained remarkable success, with substantial improvement of speech recognition accuracy in several automatic speech recognition (ASR) tasks. For distant speech recognition, the multi-channel deep neural network based approaches rely on the powerful modeling capability of deep neural network (DNN) to learn suitable representation of dista...
متن کاملA multi-channel speech enhancement framework for robust NMF-based speech recognition for speech-impaired users
In this paper a multi-channel speech enhancement framework for distant speech acquisition in noisy and reverberant environments for Non-negative Matrix Factorization (NMF)-based Automatic Speech Recognition (ASR) is proposed. The system is evaluated for its use in an assistive vocal interface for physically impaired and speech-impaired users. The framework utilises the Spatially Pre-processed S...
متن کاملOn Design of Robust Deep Models for CHiME-4 Multi-Channel Speech Recognition with Multiple Configurations of Array Microphones
We design a novel deep learning framework for multi-channel speech recognition in two aspects. First, for the front-end, an iterative mask estimation (IME) approach based on deep learning is presented to improve the beamforming approach based on the conventional complex Gaussian mixture model (CGMM). Second, for the back-end, deep convolutional neural networks (DCNNs), with augmentation of both...
متن کاملMulti-channel Noise Reduction in Noisy Environments
Multi-channel noise reduction has been widely researched to reduce acoustic noise signals and to improve the performance of many speech applications in noisy environments. In this paper, we first introduce the state-ofthe-art multi-channel noise reduction methods, especially beamforming based methods, and discuss their performance limitations. Subsequently, we present a multi-channel noise redu...
متن کامل